Goto

Collaborating Authors

 conceptual knowledge


Evaluating Hydro-Science and Engineering Knowledge of Large Language Models

Hu, Shiruo, Shan, Wenbo, Li, Yingjia, Wan, Zhiqi, Yu, Xinpeng, Qi, Yunjia, Xia, Haotian, Xiao, Yang, Liu, Dingxiao, Wang, Jiaru, Gong, Chenxu, Zhang, Ruixi, Wu, Shuyue, Cui, Shibo, Lai, Chee Hui, Luo, Wei, He, Yubin, Xu, Bin, Zhao, Jianshi

arXiv.org Artificial Intelligence

Hydro-Science and Engineering (Hydro-SE) is a critical and irreplaceable domain that secures human water supply, generates clean hydropower energy, and mitigates flood and drought disasters. Featuring multiple engineering objectives, Hydro-SE is an inherently interdisciplinary domain that integrates scientific knowledge with engineering expertise. This integration necessitates extensive expert collaboration in decision-making, which poses difficulties for intelligence. With the rapid advancement of large language models (LLMs), their potential application in the Hydro-SE domain is being increasingly explored. However, the knowledge and application abilities of LLMs in Hydro-SE have not been sufficiently evaluated. To address this issue, we propose the Hydro-SE LLM evaluation benchmark (Hydro-SE Bench), which contains 4,000 multiple-choice questions. Hydro-SE Bench covers nine subfields and enables evaluation of LLMs in aspects of basic conceptual knowledge, engineering application ability, and reasoning and calculation ability. The evaluation results on Hydro-SE Bench show that the accuracy values vary among 0.74 to 0.80 for commercial LLMs, and among 0.41 to 0.68 for small-parameter LLMs. While LLMs perform well in subfields closely related to natural and physical sciences, they struggle with domain-specific knowledge such as industry standards and hydraulic structures. Model scaling mainly improves reasoning and calculation abilities, but there is still great potential for LLMs to better handle problems in practical engineering application. This study highlights the strengths and weaknesses of LLMs for Hydro-SE tasks, providing model developers with clear training targets and Hydro-SE researchers with practical guidance for applying LLMs.


Babies' brains 'tick' more slowly than ours, which may help them learn

New Scientist

Babies' brains'tick' more slowly than ours, which may help them learn The rhythm of an infant's brain activity seems to put them in constant learning mode, whereas that of an adult may allow them to retrieve conceptual knowledge Babies' brains operate at a different rhythm to those of adults When a baby tries to make sense of what they have seen, their brain activity seems to tick at a slower rhythm than it does in adults, which may help them to continually learn new concepts. Our brain processes sensory stimuli using networks of neurons. If a neuron receives a strong enough signal from another neuron, it transmits the signal to more neurons still, producing synchronised waves of electrical activity where many neurons alternate between activated and silent states. Such brainwaves occur at various frequencies. When a given brain region displays a range of frequencies simultaneously, a higher proportion of its neurons may synchronise with certain frequencies more than others.


Beyond prior knowledge: The predictive role of knowledge-building in Tutor Learning

Shahriar, Tasmia, Ameen, Mia, Mallavarapu, Aditi, Jiang, Shiyan, Matsuda, Noboru

arXiv.org Artificial Intelligence

When adopting the role of a teacher in learning-by-teaching environments, students often struggle to engage in knowledge-building activities, such as providing explanations and addressing misconceptions. Instead, they frequently default to knowledge-telling behaviors, where they simply dictate what they already know or what to do without deeper reflection, thereby limiting learning. Teachable agents, particularly those capable of posing persistent follow-up questions, have been shown to encourage students (tutors) to shift from knowledge-telling to knowledge-building and enhance tutor learning. Tutor learning encompasses two interrelated types of knowledge: conceptual and procedural knowledge. Research has established a bidirectional relationship between these knowledge types, where improvements in one reinforce the other. This study investigates the role of knowledge-building in mediating the bidirectional relationship between procedural and conceptual learning. Our findings revealed a stable bidirectional relationship between procedural and conceptual knowledge, with higher post-test scores observed among students who engaged in knowledge-building, regardless of their procedural and conceptual pre-test performance. This suggests that knowledge-building serves as a crucial mechanism bridging the gap between students with low prior knowledge and higher conceptual and procedural learning gain.


Can Large Vision-Language Models Understand Multimodal Sarcasm?

Wang, Xinyu, Zhang, Yue, Jing, Liqiang

arXiv.org Artificial Intelligence

Sarcasm is a complex linguistic phenomenon that involves a disparity between literal and intended meanings, making it challenging for sentiment analysis and other emotion-sensitive tasks. While traditional sarcasm detection methods primarily focus on text, recent approaches have incorporated multimodal information. However, the application of Large Visual Language Models (LVLMs) in Multimodal Sarcasm Analysis (MSA) remains underexplored. In this paper, we evaluate LVLMs in MSA tasks, specifically focusing on Multimodal Sarcasm Detection and Multimodal Sarcasm Explanation. Through comprehensive experiments, we identify key limitations, such as insufficient visual understanding and a lack of conceptual knowledge. To address these issues, we propose a training-free framework that integrates in-depth object extraction and external conceptual knowledge to improve the model's ability to interpret and explain sarcasm in multimodal contexts. The experimental results on multiple models show the effectiveness of our proposed framework. The code is available at https://github.com/cp-cp/LVLM-MSA.


Towards Responsible and Trustworthy Educational Data Mining: Comparing Symbolic, Sub-Symbolic, and Neural-Symbolic AI Methods

Hooshyar, Danial, Kikas, Eve, Yang, Yeongwook, Šír, Gustav, Hämäläinen, Raija, Kärkkäinen, Tommi, Azevedo, Roger

arXiv.org Artificial Intelligence

Given the demand for responsible and trustworthy AI for education, this study evaluates symbolic, sub-symbolic, and neural-symbolic AI (NSAI) in terms of generalizability and interpretability. Our extensive experiments on balanced and imbalanced self-regulated learning datasets of Estonian primary school students predicting 7th-grade mathematics national test performance showed that symbolic and sub-symbolic methods performed well on balanced data but struggled to identify low performers in imbalanced datasets. Interestingly, symbolic and sub-symbolic methods emphasized different factors in their decision-making: symbolic approaches primarily relied on cognitive and motivational factors, while sub-symbolic methods focused more on cognitive aspects, learnt knowledge, and the demographic variable of gender -- yet both largely overlooked metacognitive factors. The NSAI method, on the other hand, showed advantages by: (i) being more generalizable across both classes -- even in imbalanced datasets -- as its symbolic knowledge component compensated for the underrepresented class; and (ii) relying on a more integrated set of factors in its decision-making, including motivation, (meta)cognition, and learnt knowledge, thus offering a comprehensive and theoretically grounded interpretability framework. These contrasting findings highlight the need for a holistic comparison of AI methods before drawing conclusions based solely on predictive performance. They also underscore the potential of hybrid, human-centred NSAI methods to address the limitations of other AI families and move us closer to responsible AI for education. Specifically, by enabling stakeholders to contribute to AI design, NSAI aligns learned patterns with theoretical constructs, incorporates factors like motivation and metacognition, and strengthens the trustworthiness and responsibility of educational data mining.


RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering

Liang, Sichu, Zhang, Linhai, Zhu, Hongyu, Wang, Wenwen, He, Yulan, Zhou, Deyu

arXiv.org Artificial Intelligence

Medical question answering requires extensive access to specialized conceptual knowledge. The current paradigm, Retrieval-Augmented Generation (RAG), acquires expertise medical knowledge through large-scale corpus retrieval and uses this knowledge to guide a general-purpose large language model (LLM) for generating answers. However, existing retrieval approaches often overlook the importance of factual knowledge, which limits the relevance of retrieved conceptual knowledge and restricts its applicability in real-world scenarios, such as clinical decision-making based on Electronic Health Records (EHRs). This paper introduces RGAR, a recurrence generation-augmented retrieval framework that retrieves both relevant factual and conceptual knowledge from dual sources (i.e., EHRs and the corpus), allowing them to interact and refine each another. Through extensive evaluation across three factual-aware medical question answering benchmarks, RGAR establishes a new state-of-the-art performance among medical RAG systems. Notably, the Llama-3.1-8B-Instruct model with RGAR surpasses the considerably larger, RAG-enhanced GPT-3.5. Our findings demonstrate the benefit of extracting factual knowledge for retrieval, which consistently yields improved generation quality.


The "LLM World of Words" English free association norms generated by large language models

Abramski, Katherine, Improta, Riccardo, Rossetti, Giulio, Stella, Massimo

arXiv.org Artificial Intelligence

Free associations have been extensively used in cognitive psychology and linguistics for studying how conceptual knowledge is organized. Recently, the potential of applying a similar approach for investigating the knowledge encoded in LLMs has emerged, specifically as a method for investigating LLM biases. However, the absence of large-scale LLM-generated free association norms that are comparable with human-generated norms is an obstacle to this new research direction. To address this limitation, we create a new dataset of LLM-generated free association norms modeled after the "Small World of Words" (SWOW) human-generated norms consisting of approximately 12,000 cue words. We prompt three LLMs, namely Mistral, Llama3, and Haiku, with the same cues as those in the SWOW norms to generate three novel comparable datasets, the "LLM World of Words" (LWOW). Using both SWOW and LWOW norms, we construct cognitive network models of semantic memory that represent the conceptual knowledge possessed by humans and LLMs. We demonstrate how these datasets can be used for investigating implicit biases in humans and LLMs, such as the harmful gender stereotypes that are prevalent both in society and LLM outputs.


Discovering Conceptual Knowledge with Analytic Ontology Templates for Articulated Objects

Sun, Jianhua, Li, Yuxuan, Xu, Longfei, Wei, Jiude, Chai, Liang, Lu, Cewu

arXiv.org Artificial Intelligence

Human cognition can leverage fundamental conceptual knowledge, like geometric and kinematic ones, to appropriately perceive, comprehend and interact with novel objects. Motivated by this finding, we aim to endow machine intelligence with an analogous capability through performing at the conceptual level, in order to understand and then interact with articulated objects, especially for those in novel categories, which is challenging due to the intricate geometric structures and diverse joint types of articulated objects. To achieve this goal, we propose Analytic Ontology Template (AOT), a parameterized and differentiable program description of generalized conceptual ontologies. A baseline approach called AOTNet driven by AOTs is designed accordingly to equip intelligent agents with these generalized concepts, and then empower the agents to effectively discover the conceptual knowledge on the structure and affordance of articulated objects. The AOT-driven approach yields benefits in three key perspectives: i) enabling concept-level understanding of articulated objects without relying on any real training data, ii) providing analytic structure information, and iii) introducing rich affordance information indicating proper ways of interaction. We conduct exhaustive experiments and the results demonstrate the superiority of our approach in understanding and then interacting with articulated objects.


Reconciling Different Theories of Learning with an Agent-based Model of Procedural Learning

Rismanchian, Sina, Doroudi, Shayan

arXiv.org Artificial Intelligence

Computational models of human learning can play a significant role in enhancing our knowledge about nuances in theoretical and qualitative learning theories and frameworks. There are many existing frameworks in educational settings that have shown to be verified using empirical studies, but at times we find these theories make conflicting claims or recommendations for instruction. In this study, we propose a new computational model of human learning, Procedural ABICAP, that reconciles the ICAP, Knowledge-Learning-Instruction (KLI), and cognitive load theory (CLT) frameworks for learning procedural knowledge. ICAP assumes that constructive learning generally yields better learning outcomes, while theories such as KLI and CLT claim that this is not always true. We suppose that one reason for this may be that ICAP is primarily used for conceptual learning and is underspecified as a framework for thinking about procedural learning. We show how our computational model, both by design and through simulations, can be used to reconcile different results in the literature. More generally, we position our computational model as an executable theory of learning that can be used to simulate various educational settings.


Editing Conceptual Knowledge for Large Language Models

Wang, Xiaohan, Mao, Shengyu, Zhang, Ningyu, Deng, Shumin, Yao, Yunzhi, Shen, Yue, Liang, Lei, Gu, Jinjie, Chen, Huajun

arXiv.org Artificial Intelligence

Recently, there has been a growing interest in knowledge editing for Large Language Models (LLMs). Current approaches and evaluations merely explore the instance-level editing, while whether LLMs possess the capability to modify concepts remains unclear. This paper pioneers the investigation of editing conceptual knowledge for LLMs, by constructing a novel benchmark dataset ConceptEdit and establishing a suite of new metrics for evaluation. The experimental results reveal that, although existing editing methods can efficiently modify concept-level definition to some extent, they also have the potential to distort the related instantial knowledge in LLMs, leading to poor performance. We anticipate this can inspire further progress in better understanding LLMs. Our project homepage is available at https://zjunlp.github.io/project/ConceptEdit.